Identifying free text plagiarism based on semantic similarity

نویسندگان

  • George Tsatsaronis
  • Iraklis Varlamis
  • Andreas Giannakoulopoulos
  • Nikolaos Kanellopoulos
چکیده

It is common knowledge that plagiarism in academia goes as back in time as research itself. However, in the last two decades this phenomenon of academic deception has turned into an academic plague. Undoubtedly, the rapid expansion of the Web and the vast amount of publicly available information and documents facilitate the unethical malpractice of computer-aided plagiarism, which in turn has inflated the problem. Anti-plagiarism techniques build upon technological solutions and especially the development of task-specific software. The role of anti-plagiarism software for text is to process a document and identify the pieces of text that have been reproduced from another source. This work presents a semantic-based approach to text-plagiarism detection which improves the efficiency of traditional keyword matching techniques. Our semantic matching technique is able to detect a larger variety of paraphrases, including the use of synonym terms, the repositioning of words in the sentences etc. We evaluate our methodology in a dataset comprising positive and negative plagiarism samples and present comparative results of both supervised and unsupervised methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

English-Persian Plagiarism Detection based on a Semantic Approach

Plagiarism which is defined as “the wrongful appropriation of other writers’ or authors’ works and ideas without citing or informing them” poses a major challenge to knowledge spread publication. Plagiarism has been placed in four categories of direct, paraphrasing (rewriting), translation, and combinatory. This paper addresses translational plagiarism which is sometimes referred to as cross-li...

متن کامل

Citation-Based Plagiarism Detection: Practicability on a Large-Scale Scientific Corpus

The automated detection of plagiarism is an information retrieval task of increasing importance as the volume of readily accessible information on the web expands. A major shortcoming of current automated plagiarism detection approaches is their dependence on high character-based similarity. As a result, heavily disguised plagiarism forms, such as paraphrases, translated plagiarism, or structur...

متن کامل

Plagiarism checker for Persian (PCP) texts using hash-based tree representative fingerprinting

With due respect to the authors’ rights, plagiarism detection, is one of the critical problems in the field of text-mining that many researchers are interested in. This issue is considered as a serious one in high academic institutions. There exist language-free tools which do not yield any reliable results since the special features of every language are ignored in them. Considering the paucit...

متن کامل

Web-based Demonstration of Semantic Similarity Detection Using Citation Pattern Visualization for a Cross Language Plagiarism Case

In a previous paper, we showed that analyzing citation patterns in the well-known plagiarized thesis by K. T. zu Guttenberg clearly outperformed current detection methods in identifying cross-language plagiarism. However, the experiment was a proof of concept and we did not provide a prototype. This paper presents a fully functional, web-based visualization of citation patterns for this verifie...

متن کامل

Analyzing Semantic Concept Paerns to Detect Academic Plagiarism

Detecting academic plagiarism is a pressing problem, e.g., for educational and research institutions, funding agencies, and academic publishers. Existing plagiarism detection systems reliably identify (nearly) copied text, but o‰en fail to detect disguised forms of academic plagiarism, such as paraphrases, translations, and idea plagiarism. We present Semantic Concept PaŠern Analysis an approac...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010